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ABSTRACT 

On-line retrieval syctem design is discussed in the 
two papers which make up Part Five of this report on Salton's Magical 
Automatic Retriever of Texts (SMART) project report. The first paper: 
“A Prototype On-Line Document Retrieval System^' by D, Williamson and 
R« Williamson outlines a design for a SMART on-line document 
retrieval system using console initiated search and retrieval 
procedures* The conveesa tional system is described as well as the 
program organization. The second paper; ’^Template Analysis in a 
Conversational System'^ by S, ?• Weiss discusses natural Janguage 
conversational systems. The use ot natural language makes possible 
the implementation of a natural dialogue system, and renders tho 
system available to a wide range of users, A set or goals for such a 
system is presented. An experimental conversational system is 
impienented using a template analysis process, A detailed discussion 
of both user and system performance is presented, (For the entire 
SMART project report see LI 002 719 and for parts 1-4 see LI 002 720 
through LI 002 723.) (NH) 
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This summary discusses all 5 parts of Informaticu Storage 
and Retrieval (ISR-18), which is available in its entirety as 
LI 002 719. Only the papers from Part Five are reproduced here 
as LI 002 Ilk. See LI 002 720 thru LI 002 723 for parts 1-4. 

Summary 

The present report is the eighteenth in a series describing research 
in automatic information storage and retrieval conducted by the Department 
of Computer Science at Cornell Universityi The report covering vork carried 
out by the S>1ART project for approximately one year (summer 1969 to summer 
1970) is separated into five parts: automatic content analysis (Sections 

1 to IV) , automatic dictionary construction (Sections V to VII) , user feed- 
back procedures (Sections VIII to XI) , document and query clustering methods 
(Sections XII and XIII)/ and SMART systems design for on-*line operations 
(Sections XIV and XV) . 

Most recipients of SMART project reports experience a gap in 

the series of scientific reports received to date. Report ISR-'l?, consisting 
of a master's thesis by Thomas firauen entitled "Document Vector Modification 
in On-line Information Retrieval Systems" was prepared for limited distribu- 
tion during the fall of 1969. Report ISR-17 is available from f.e National 
Technical Inforination Service in Springfield, Virginia 22151, ;irider order 
number PB 186-135. 

The SMART system continues to operate in a batch processing mode 
on the IBM 360 mode) 65 system au Cornell University. The standard processing 
mode is eventually to be replaced by an on-line system using time-shared 
console devices for input and output. The overall design for such an on-line 
Version of SMART has been completed, and is described in Section XIV of the 
present report. ^iThile awaiting the time-sharing implementation of the 
system, new retrieval experiments have been performed using larger document 
collections within the existing system. Attempts to compare the performance 

er|c lo 






of several collecc.ioris of different - -es must take into account the 
collection **generali :y'\ A study of this problem is made in Section II of 
the present report. Of special interest may also be the new procedures 
for the automatic recognition of "common** words in English texts (Section 
VI) , and the automatic construction of thesauruses and dictionaries for use 
in an automatic language analysis system (Section VII) . Finally, a new 
inexpensive method of document classification and term grouping is 
described and evaluated in Section ):il of the present report. 

Sections I to IV cover experiments in automatic content analysis 
and automatic indexing. Section I by S. F. Weiss contains the results of 
experiments, using statistical and syntactic procedures for the automatic 
recognition of phrases in written texts. It is shown once again that be- 
cause of the relative heterogeneity of most document collections, a:id 
the sparseness of the document space, phrases are not normally need jd 
for content identification. 

In Section II by G. Salton, the "generality" problem is examined 
which arises when two or more distinct collections are compared in a 
retrieval environment. It is shown that proportionately fever nonrelevant 
items tend to be retrieved when larger collections (of low generality) 
are used, than when small, high generality collections serve for evaluation 
purposes. Ihe systems viewpoint thus normally favors the larger, low 
generality output:, whereas the user viewpoint prefers the -performance of 
the smaller collection. 

The effectiveness of bibliographic citations for content analysis 
purposes is examined in Section III by G. Salton. It is shown ^chat In 
some situations when the citation space is reasonably dense, ‘■he use of 
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citations attached to documents is even more effective than the use of 



standard keywords or descriptors. In any case, citations should be added 
to the normal descriptors whenever they happen to be available. 

In the last section of Part 1, certain template analysis methods 
are applied to the autorutic resolution of ambiguous constructions 
(Section IV by S. F. Weiss), It is shown that a set of contextual rules 
can be constructed by a semi-automatic learning process, which will eventually 
lead to an automatic recognition of over ninety percent of the existing 
textual ambiguities. 

Part 2, consisting of Sections V, VI and VII covers procedures 
for the automatic construction of dictionaries and thesauruses useful in 
text analysis systems. In section V by D. Bergmark it is Siiown that word 
stem methods using large coirmon word lists are more effective in an infor- 
mation retrieval environment that some manually constructed thesauruses, 
even though the latter also include synonym recognition facilities. 

A new model for the autoiuatic determination of '"common" words 
(which are not to be used for content identification) is proposed and 
evaluated in Section VI by K. Bonwit and J- Aste-Tonsmann, The resulting 
process can be incorporated into fully automatic dictionary construction 
systems. The complete thesaurus construction problem is reviewed in Section 

VII by G. Saltcn, and the effectiveness of a variety of automatic dictionaries 
is evaluated. 

Part 3, consisting ^f Sections VIII through XI, deals with a 
number of refinements of the normal relevance feedback process which has 
been examined in a number of previous reports in this series. In Section 

VIII by T. P. Baker, a query splitting process is evaluated in which input 

xvii 




queries are split into two or more parts during feedback v/henever the 
relevant documents identified by the user are separated by one or more non- 
relevant ones. 

The effectiveness of relevance feedback techniques in an environ- 
ment of variable general! :y is examined in Section IX by B. Capps and M. 

Yin, It is shown that some of the feedback techniques are equally applica- 
ble to collections of small ind large generality. Techniques of negative 
feedback (when no relevant items are identified by the users, but only 
nonrelevant ones) are considered in Section X by M, Kerchnar, It is shown 
that a mambcr of selective negative techniques, in which only certain 
specific concepts are actually modified during the feedback process, bring 
good improvements in retrieval effectiveness over the standard nonselective 
methods . 

Finally, a new feedback methodology in which a number of documents 
jointly identified as relevant to earlier queries are used as a set for 
relevance feedback purposes is proposed and evaluated in Section XI by L, 
Paavola, 

Two new clustering techniques are examined in Part 3 of this report, 

consistin.j of Sections XII and XTII, A controlled, inexpensive, single-pass 

clustering algoritlim is described and evaluated in Section XII by D. B. 

Johnson and j, M, Laiuente. In this clustering method, each document is 

< 

examined only once, and the- procedure is shown to be equivalent in certain 

circumstances to other more demanding clustering procedures. 

> 

The query clustering process, in which query groups are used to 
define the information search strategy is studied in Section XIII by S. 
Worona. A variety of parameter values is evaluated in a retrieval environ- 
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nent to be used for cluster generation, centroid definition, and final 



search strategy. 

The last part, number five, consisting of Sections XIV and XV, 
covers the design of on-line information retrieval systems. A new 
SMART system design for on-line use is proposed in Section XIV by D. and 
R. Williamson, based on the concepts of' pseudo-batching and the interaction 
of a cycling program with a console monitor. The user interface and 
conversational facilities are also described. 

A template analysis technique is used in Section XV by S. F- Weiss 
fcr the implementation of conversational retrieval systems used in a time- 
sharina environment- The effectiveness of the method is discussed, as 
well as its implementation in a retrieval situation. 

Additional automatic content analysis and search procedures used 
with the S!4ART system are described in several previous reports in this 
series, including notably reports ISR-11 to rsR-16 p^iblished between 1966 
and 1969. These reports are all availctble from the National Technical 
Information Service in Springfield, Virginia. 
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XIV. A Prototype On-Line Document Retrieval System 
D, Williamson and R, Williamson 



Abstract 

A design is outlined for a SMART on-line document reti’ieval system, 
using coiisole initiated search and retrieval procedures. The conversational 
system is described as well as the program organization, 

1. Introduction 

The SMART system presently contains routines for experimental, off- 
line document retrieval. The experimental results ohtaine^d so far indicate 
that automatic document retrieval can provide useful information for 
general library users. The next logical step is the development of a suit- 
able user-oriented interface pvoviding access via on-line consoles in an 
interactive manner. 

This report describes a prototype, on-line document retrieval 
system and a user interface. The system which is outlined is intended to 
prewide the best service possible to on-line users at a reasonable cost, but 
could also be efficiently used with very few modifications as a batch or 
remote entry system. While initial test* ^ with collections of only a few 
thousand documents and less than five consoles is anticipated, the mecha- 
nisms used are intended 1 o be applicable without revision to much larger 
collections of about 500,000 documents, and up to one to tvio hundred input- 
output consoles. 



20 



XIV-2 



2. Anticipated Computer Conf igui’^ation 

In order to provide adequate response times — about 10 seconds for 
minor inputs and about 30 seconds for responses to search commands — a 
large, high-Gpeed computer is necessary. Document retrieval, like many 
other non-numeric processes, requires a large data base of which a small, but 
substantial, fraction must be accessed for each query. Thus, it is necessary 
to operate with large, on-line files — presumably on a disk (although certain 
files could be placed on a data cell type device). 



Whiles a large computer is necessary to support the input-output equip- 



ment, and provide reasonable respon^>e times, an on-line retrieval system 
such as SMART, will not be able to utilize the full resources of a large 
machine. First, periods will occur when no users wish to avail themselves 
of the on-line system i and even when actual users are present, most of the 
real-time of an interaction is spent waiting for user decisions. Also, 
while processing a search request, the computer may be expected to be input- 
output (I-O) bound waiting for vocabularies and documents to be brought 
into core. 

If pi^ocessing costs are to be reasonable, provision must be made 
to permit non-retrieval users to process While the retrieval system is in- 
active for one reason or another. The type of environment needed is typi- 
fied by many of the nulti-processing and tim ^.-sharing systems available on 
large machines today. Weth these systems, jobs are effectively allocated to 
two queues: most are awaiting execution, and a few are in execution. Those 

in execution share the central processor (C.P.L^), memory, and on-line 
stoi^age devices. Each memory area and storage device is usually dedicated 
to c single job. (In addition, a few devices and storage areas are normally 
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i^eserved for the sup'-rvisor which is used by all nobs.) CPU allocation is 
norrially switched froni one executing job to another (through the supervisor) 
whenever that job is blocked — usually because it rrust await completion of 
an I-O transmission. System blocks are provided to prevent jobs from mono- 
polizing the CPU, when no blocks occur* for a certain time. 

In the normal course of events, each executing job receives the 
opportunity to use the CPU several times a minute. Much of the time, a 
retrieval process such as SMARl will be unable to utilize the opportunity 
to process. However when SMART has work "io do, and the information necessary 
to do thac work is available, the CPU is normally accessible — effectively 
instantaneously. The reason is that the retrieval tasks will appear as 
highly t-0 bound jobs, which are therefore uore !■ isideni fi long periods 
of time, and are usually high in priority for CPU access. 

SMART can make efficient use of as much core storage as can be 
made available. However, the retrieval routines tend to be small , and are 
highly cveriayable ; thus, the basic core area requirements are quite small. 

As in o1her typical data processing applications, the major core requirements 
in a retrieval program are for aata areas in wliich to place I-O buffers for 
dictionaries, documents, etc. it would be most desirable if SMART could 
obtain 100 K to '^00 K of core (possibly from a bulk core rather than from 
the high speed main core) on demand, for periods of only several seconds 
each time a request (or group of pseudo-batched requests) are processed. 

This core could easily come from the system buffer pool. However, sharing 
of core in this way is not a normal feature of today's operating systems i 
thus, SMART will undoubtedly have to reserve an area of high-speed core lor 
programs (25-30 K bytes), and an area of bulk core for data (at least 50 K 
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bytes —however, the tnore core is available, the faster will normally be 
the obtainable response times). 

3, On**Lin«i Document Reti'ieval ~A (Jser*s View 

Ifv^hen control of a console is transferred to SMART, the remote unit 
should be titled clearly to indicate to the user what basic information is 
needed at each step (detailed information should be provided as specified 
by a user^s manuaJ.). 

If 'MART is on-line at the time of console transfer, the user must 
first enter such basic information as his name and account number (see 
Fig. 1). After this information is accepted by SMART, the user can proceed 
to ask for the execution of a gl <:en process. Many processes, such as query 
searches, query updates, and displays of output are available. 

An initial user will probably start with a single query search 
(such as shown in Fig. 1). In xhis case, he will type in his query and 
then for a search to be done. The results will be displayed (in one 
of several possible forms, such aa titles, abstracts, etc.), and the user 
will then either get a 'urther display of the documents, or use the results 
of the search at that point. 

Several types of dlspJa^ for retrieved documents could be used. 

The volume of information included in abstracts (or full art icier) is 
likely to be so large that teletype display will be impractically slow; 
cathode-ray tube display is however qu'.te expensive. Storage of abstracts 
at the remote terminal is an attractive alternative, witJi storage either on 
microfiche cards or in corputor listings. 

Following the xx^trieval of an initial set of abstracts , the query 

o 
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$PROCEED 

• SWART 

# 

# SMART 

#What is your name?“ 

-Joe Cornell 

^V.T'at is yoiir access code?- 
-NOME 

?VYour access code is "MNAIZ" . 

^Do you To en'cer a query? 

-Yes , 

^Please enter your 1 th query. 

^fXype ^*End of query.** when finished. 
-What articles are there in ... 

End of query. 

your query ready for analysis?- 
-Yes. 



A Typical User's First Query 



Fig. 1 
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Rank 



1 



2 



Artic.le Correlation 

60x1212 0,6708 

L. B, Heilprin, Towards a Definition of 
Information Science 

45x1215 0.4472 

D, Crosland, Graduate Training in 
Information Science 



3 03x1210 0,3823 

K, L. Taylor, In Information Science 
Education 

4 21x1209 0.3660 

Personnel — An Assessment and Projection 

5 43x1206 0.3651 

A. M. Rees, The Education of Science 
Information Perronnel A Challenge 
to the Library Schools 



Results of Initial Search of Query 1 



rig. 1 (continued) 
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Following the ret. eval of an initial set of abstracts, the quer^' 
author can return to the console and give the system his estimated relevance 
decisions. Since a prime source of error in all document retrieval systems 
is the discrepancy between a query author's intended query and his expressed 
query, initial queries can often be greatly improved through a process known 
as relevance feedback. This process modifies the query by adding words used 
in the relevant documents to the query, thus enlarging, and hopefully, im- 
roving the query. To improve his query, the user would re-enter the system, 
asking fc.r a search on the original query plus relevant documents. An example 
of the re-entry to use feedback is shown in Fig. 2. In this case, ;he user 
asks to delete titles and uses only minimal replies. After the preliminary 
5:ign on at the console, the user is asked if he wishes to submit relevancy 
decisions for any c.ctive queries (in this case query 10). An indication must 
then be given of these decisions on a relevance scale from 1 to 5 . After 
entering the decisions, the user asks for relevance feedback, and gets the 
results in a manner similar to the search results in Fig. 1. 

For moi'e experienced users, other procedures might be useful. Dic- 
tionary display to help the user construct more reasonable queries is possible, 
and various types of syntactic analysis can be used. The U3er can also alter 
the searching methods used by utilik:ing his private search parameters instead 
of the standard system parameters. 

Each of the various procedures available to users requites specific 
patterns of interaction between trie console and the user. Table 1 contains 
a tabular display of portions of a proposed console interface. Only a few 
of the procedures ara traced in full, as an example or how such an interface 
would be constructed. The importance of the table lies in its overall stru^- 
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ture — the specific wording of the messages and the division of labor among 
table figments is cr minor interest. However, it should be noted that 
console interaction is handled in a sequential manner. Thus each user is 
associated with just one poiiiter indicating the segment to which he is 
replying. 

Each table segment consists of one computer to console message inclu-- 
ding a possible user response, or system action. If an unanticipated response 
is obtained in a basic system, the text will be repeated in tutorial mode. 

In a more advanced system, special segments could be set up to handle unanti- 
cipated responses in special ways. 

Several responses are global in that they could appear at any time 
rather than in response to a specific SMART message. These are listed in 
Table 1 under segment 0 (e.g. reply class shifts). The normal form of a 
response is a key phrase followed by a carriage i^etur’n. Some responses can 
include explicit requests for changes in parameter values at the user^s 
option. For those respoiises which can take up more than one line, a period 
terminates the response. 

Some responses can contain a number of pcrirus, and consist of more 
than one line, e. g. queries. Such respc-nses are terminated by a key phrase, 
e.g. ’'End of query.". To eliminate problems caused by missing periods, etc., 
a user should be required to enter at least one character within 10 seconds 
of a carriage return; otherwise the multiple line response is considered 
complete. Such a rule is needed to prevent the system from waiting for user 
action while at tho same time the user ir expecting action by the computer, 

Each reply text uses an ampersand to indicate a mandatory carriage 
return. Additional carriage returns are inserted as needed by a console 
mersage director d-,pending on the number of characters per line available on a 
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$ Proceed 

- SMART. 

- No title, miniTaal i^eplies. 

Nam^? 

- Mike Lesk 

# Access Code 

- XAQ13 

» 

SMART XAQ13 Mike Lesk 

# Relevancy decisions for active query 10? 

- Yes. 

^ Document H 4C5 603 201 815 lOOOu 

- Decisions 3, 4, 3, 5,1 

# Abstract decisions? 

- Yes. 

# Relevance Feedback? 

- Yes. 

Search? 

- Yes, search. 

a 

# Results of 3rd seaich of Query 10 



- DONE 

H Control is relinquished to the supervisor. 
$ Proceed 



RelovAHce Feedback 
Fig. 2 
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specific console. A hyphen indicates that tlie console j.eyboard .s un~ 
locked for a user response. Each quoted anticipated response, such as the 
key pnrase responses, can be abbreviated by using only the capital letters 
specified in tlie response, All anticipated responses can bi typed using 
any mixture of upper or lowei case letters. 

The conten-'j of the 'Internal' column are, for the most part, 
self-explanatory. The use of the variable READY is described later but 
included in the Table for completeness. It indicates whether console inter- 
action is needed, or whether internal work is needed. 

The 'Next Segment' field indicates which segment is to be considered 
next. Often this is dependent on the response or the Action field, Ar '‘R” 
indicates a return to whichever segment was previously considered. Each 
user is assigned variables to indicate the segment he is in and the line of 
text {for that segment's message) that is being transmitted. When a console 
joins SMART, logical control is first set at segment 9 if SMART is on-line, 
otherwise control is set at segment 1. Note that segments above 104 are not 
included in tl*e Table, but would be set up in the same way as other segments. 

4. Console Di'iven Document Retrieval — An Internal V'ew 

This section describes a possible implementation of the on-line 
document retrieval svstem presented earlier. All routines available for 
batch SMART runs are usable without any reprogramming. An on-line executive 
program is however needed to drive the consoles and the batch routines, 

A) The Internal Structure 

The inter:. al structure needed for a prototype system must satisfy 
sever a’ goals. As indicated in the introduction, a prototype system must 
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Segment 


Reply 


Messages for Consoles 


Anticipated Responses 


Interna] Action 


Next 


Number 


Class 




from Consoles 




Segment 


0 




(none ) 


"DONE'’ 




51 








(Attention Key) 


Delete trans- 
mission and 
act ivate 
keyboard 


R 








’’Tutorial Replies" 


REPCLS = 
Tutorial 


R 








"Short Replies" 


REPCLS = Short 


R 








"Minimal Replies" 


REPCLS ' 
Minimal 


R 








"?" or an unantici- 


If REPCLS = M 










pated response 


Then REPCLS - S 


R 










If REPCLS =: S 
Then REPCLS = T 


R 










Tf REPCLS = T 


9000 


[ 

; 1 








SMART is on-line 


2 


i 








SMART is not 
on-line 


3 


2 




#SMART is already on- 
line. You may not 
initiate a duplicate 
system. 






51 


3 




^SMART is initiated. 
Your console is the 
master console. 


"Yes" 


NEWCON = Yes 


3.5 






May other consoles 
. attach to SMART?- 


"No" 


NEWCON = No 


3.5 


! 3.5 






(Reply Class Shift 
Only) 




4 
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a) Introductory Segments 



SMART Console Interface 



Table 1 
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Segment 

Number 


Reply 

Class 


Message for Consoles 


Anticipated Responses 
from Consoles 


Internal Action 


Next 

Segfi.en 


L; 


S 


j^What is your name?- 


User's Name 


Store Name 


6 




M 


iifName?- 








6 


S 


#What is your access 
code? 


"None" 


Assign an access 
code 


7 




M 


code?- 


Access code 


Verify ccde-OK 
NOK 


8 

9900 


7 




Your access code 
is 




N U MC U S ( -numb er 
of customers 












ACC0DE( -User's 
new access code 












Store access code 


100 


8 




^Welcome to SMART. 












ACCODE 




ACC0DE( -access 
code 








NAME 




NAME (-User's name 
as on file 












Does user have any 
unfinished queries? 

Yes 

No 


;ooo 

100 


9 








If SMA'IT is 
on-line 


3.5 










If SMART is 
off-line 


10 


10 




# SMART is not now 
on-line . Retrieval 
v.ill be available 
(time, day). 






51 



a) Introductory Segnents (contd.) 




SMART Console Interface 
Table 1 (continued) 
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Segment 


Reply 


Message for Consoles 


Anticipated Responses 


luternaJ Action 


Next 


Number 


Class 




from Consoles 




Segment 


50 


S 


^Please select one of 


’^Done 




51 






the following 
programs. . . 


’’Query . " 




100 








’’Analyze , 




500 






^^Query, Analyze, Search, 
Display, Feedback, 
Pre~search, Search 


"Analyze using 
XYZ strategy 


AHALPV=XYZ 


500 






Options, Feedback 


"Search 




1000 






Options, Analysis 
Opt ions , Judgments , 
Done . 


"Search, using 
XYZ strategy," 


SEARPV=XYZ 


1000 








"Display." 




2000 








"Feedback ." 




3500 








"Feedback, using 












XYZ strategy 


FFFDPV = XYZ 


3500 








"Judgments. " 




3000 








"Pre-search. " 




4000 








"Analysis options." 




5000 








"Search options." 




6000 


i 






"Feedback options." 




7000 


! 

i 




#Thank you for using 




READY = 0 




i 




SMART 




TST = 0 








^Control is 




Return control 








relinquished. 




of console to 
supervisor 





b) Central Director 



SMART Console Interface 
Tabic 1 (continued) 
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Segment 

Number 


Reply 

Class 


Message for Consoles 


Anticipated Responses 
from Consoles 


Tncarna.', Action 


Next 

Segment 


ICO 




Do you wish to enter 
a query? 


"Yes." 

"No." 




101 

50 


101 


S 


^Please enter your 
MAXQUEth query. 

#Type "End of 
query." when finished. 




MAXQUE = KAXQUE 
+ 1 

NUMQUE = MAXQUE 


102 




M 


^^Enter MAXQUEth query. 








102 




- 


A line of a query. 


Store line. 












Does line end 
in EOQ? YES 


1C3 










No 


102 


103 


S 


#ls your query ready 
for analysis? - 


"Delete Query." 


MAXQUE = MAXQUE 
- 1 












Delete query 


101 




M 


^Analyze?- 


"Add to Query." 
"Boolean." 


Does user want 
to supply 
Boolean Informa- 
tion? YES 


104 










No 


500 








"Yes» Search." 


DOANAL = 1 










"Yes." 


DOCENT = 1 












DO SEAR = 1 


500 








"YeSs Search, using 
XYZ Strategy." 


As above and 
SEARPV = XYZ 


500 








"Yes, using XYZ 
Strategy." 


DOANAL = 1 


1 

1 

1 










ANALPV = XYZ 










"No." 




SO 



c) Query Text Handling 



SMART Console Interface 
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have the speed and ease of use of a production system » as well as the flexi- 
bility and measurability of an experimental system. A document retrieval 
system must provide fast on-line service and exhaustive, inexpensive off- 
line service. A typical first thought is simply to provide two systems ~ 
one for on-line work, and the other for off-line work. However, a single, 
flexible system capable of handling both types of service is normally less 
expensive to develop, operate and n.aincain than two separate systems, pro- 
vided a scheme with the needed features can be found. 

The flexibility required to provide on-line and off-line service in 
a single package is best illustrated by the differing amounts of transmitted 
information. Off-line users will want, and can afford, to use i .n gt volumes 
of information. Such a volume of information cannot be tra:. oni ' J lew 
cost to an on-line user, nor would an on-line user be able tc with the 

quantity of information of use and interest to an off-line uo.er. 

Another illustration of the needed flexibility is related to machine 
storage. During off-hours, ownership of large amounts of stor ige ^or long 
lengths of tjme may be possible. Most on-line requests, how vei , will be 
serviced during the day when ethers also want tc use the ccmpu'^i. To reduce 
costs, it is necessary that a minimum of computer resources I ^ jntly 

arrocdtcd to each specific task. Unfortunately, human respoi.se mes are 
much slower than normal computer response times when the co:..^ itt r 1 being 
used for batch processing, for example, a complete off-line sc rch for 
42 queries and 1400 document.^ can be completed in less real- 1 'me tiiSn a 
single on-line query because of the slowness of human resj-cnse. 
the 42 query search requires rr.ore process time.) 

o 
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B) General Characteristics of SMART Routines 

To satisfy the need for flexibility and modifiability, SMART is 
programmed as a set of small, clearly defined, and well docu: ented Fortran 
subroutines. Each subrout ii^e accomplishes one task with a minimal inter- 
face with other routines, F ■jch SMART routine lies in a distinct class 
depending on the amount of structure in the data used or manipulated. On 
the bottom of the pyramid are the I-O routines and the MOVE routines 
(which move sets of sequential locations from one place to another). These 
routines ^'kiiow*' only the length and origin of the fields v;ith which the^ 
deal . 

Next in the hierarchy are routines which deal with the various kind 
of vectors. SI^RT uses several kinds of vectors, all consisting of v 
”head'^ indicating the length of the vector followed by information in 
double words. In the case of concept vectors, these double wprds contain 
concepts and weights; in the case of result vectors, the first word contains 
the document number and rank retrieved (each in half words), and the corre- 
lation of the document with the query. The routines that deal with these 
vectors "know” the internal structure of the vectors. Some examples of 
this class of routine are LSTCON, which prints the contents of a concept 
vector, and RESULT, which prints the contents of a vector of document -query 
correlations . 

Above this level are routines which deal with groups of vectors. 

These are the routines which know that many queries exist in the system. 
Typical of these routines is BLOCK, which combines the result vectors for the 
several iterations of one query during a batch run, and gives the combination, 
one query at a time, to RESULT. 

o 




3o 



XIV-17 



At the top to the entire pyramid are the routines EXFC and 0>JLINE. 
EXEC is a card-controlled driver for the system < It is normally used for 
batch experimental work and iobs typically done off-line, such as the 
addition of new text and centroid generation, ONLINE is normally used to 
control on-line document retrieval. A partial tree of EMAKT routines 
shoiJiijg this structure follows in Fig. 3, 

C; P:;eudo-B.3tching 

Basic to an understanding of the mechanism proposed for document 
retrieval is the idea of pseudo-batching . In any reasonable batch-pro- 
cessing document retrieval system, a large number of queries are handled 
in parallel. This serves to reduce the fixed overhead per query to a 
fraction of the total overhead. 5o long as the increased expense of dealing 
with several queries is kept snail, there is a ; et gain irx effect iven^:ss 
per unit cost. 

A basic problem in an on-line document retrieval system is that 
each search passes through different stages with different requirements. 

This presents problems because of the multiplicity of distinct programs 
which may he required, as well as the input -output problems. If each query 
is mult i -programmed with other queries, severe competition for resources 
would result. One queiy v;ould need document files, anothei' dictionaries, and 
yet another would require text files. A complicated scheduling algorithm 
would be required to untangle the x^equii ements for file access facilities 
and storage space; this would increase overhead cosl'« sharply. 

In an on-line system where maiiy users individually cycle Uircugh 
tl',e same set of routines and files, a much better utilization of resources 
results by batching the incoming queries. It the system processes only 
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Ctructure of SMART Routines 



Fig. 3 
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those queries available at th.e start of a (twenty second) cycle, (ompeti- 
tion for resources is eliminated. Each query would then take thirty 
seconds on the average; twenty seconds of actual processing and ten 
seconds of waiting. 

Many advantages can be accrued to the overall system and thus to the 
user by the batching of queries. Of greatest importance is the resulting 
lack of Gompetitior for different files or for space to store them. Secondly, 
each query has an apparent overhead considerably less than it would have 
if it were the only query to use a file at a given time. Obviously, lovjer 
overhead means lower cost. 

D) Attaching Consoles to SMART 

Since one can assume that consoles will not be continuously dedi- 
cated to a document retrieval system, at least in an experimental environment, 
provision must be made for transfer of control of a console from the computer 
supervisor to SMART. If SMART is core-resident and a specific console is 
wanted for SMART, the process is as simple as obtaining additional disk space 
or more core. However, it is desirable that a user be able to go to any 
available, supervisor-controlled console, and that the console be transferred 
to SMART at the user's Initiation. Under such circum-stances , the possibility 
also exists that SMART is not available on-line at some given time. Naturally, 
the problems and cost of serving additional users are far less when SMART is 
already on-line than when SMART must be started for che first user. Since 
SMART wishes to permit anyone to utilize the document retrieval system, 
provision rjust be made to prevent the occurrence of unreasonable expenses. 

One obviously unreasonable expense is the improper activation of SMART. 

Another problem is the need to keep to a ninimium the actions which II, e 
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typical, non-computer-oriented user must carry out to use the SMART system 
on-line . 

For these reasons SMART could include a small routine that is con- 
tinuously cl part of the supervisoi’. Normally, after a user lias activated 
a console (e.g, by dialing the computer 5f telephone lines are used), the 
computer expects the name and account number of the user {in order to pre- 
vent unauthorized usage). The user may then enter simply the word "SMART’\ 
This will cause the execution of a program called SMTLATCh which is fiupplied 
with the "name" of the console presently wanting SMART. This code will 
"know" whether SMART is on-line or not. 

If SMART is not on-line an appropriate response is made. (An 
example is presented in Fig. 4.) If SMART is on-line, the console number 
of the new user will be made available to the normal SMART programs and a 
flag will be set indicating that a new console needs to be attached. When 
SMART regains use of the compu*^er, the supervisor can be i^equested to 
transfer control of thac console to SMART. 



(Dial computer and press carriage return.) 
^Pi'oceed . 

^SMART. 

$SMART will be available next at 3 p.m. 
Tuesday, October 4, 1968. 

^Proceed . 

% 



Console Response to a Request for SMART 
When SMART is not On-line 

Fig. M 
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E) Coiisole Handling — The Supervisor Interface 

SMART will not need to worry about physical control of the consoles. 
Rather SMART provides a routine which the supervisor can call whenever a new 
line is available from a console. The console keyboard is then locked (i.e. 
nothing more can be typed by the user) until SMART allocates space for a 
new line somewhere in a SMART section of memory and so tells the supervisor. 
Alternatively, at this time, SMART can transmit a line to the console. Nor- 
mally the console keyboard will be freed fast enough (if multi-lii.e input 
is anticipated) so that the user will be unaware of the keyboard ever being 
lucked . 

When SMART wishes to write on a console (which includes unlocking 
the console keyboard), a call to the supervisor is made with the location 
of a message and the name of specific console on which the message is to 
appear* If the keyboard of that console is locked, the message Ls immediately 
transmitted. If the keyboard is not locked, the transmission is refused and 
SMART will have to lock the keyboard first and accept v:hatever message was 
transmitted, (On the equipment presently available the console cannot be 
locked; oi:ly the user can lock the keyboard by pressing ’’Attention'' or 
’'Carriage Return”; the system must therefore wait for user action.) 

r) Parameter Vectors 

As each enquirer is introduced to SMART, he is associated with a 
user vector that contains pointers to parameter vectors. These vectors are 
filled with information taken from control cards during a batch processing 
run, or from a default vector for new on-line users* or from personal para- 
meter vectors. These parameters supply values needed to control the action 
of the retrieval routines. Each user may define his own personal parameter 
vectors which can be saved for use on many searches* 

4u 
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G) Ihe Flow of Control 

The flow of batched queries is comparatively simple compare^f" to 
that of on-line queries. Although batched and on-line queries use different 
means to fill parameter vectors, and take different action with respect to 
the output of most routines, these differences are unimportant. 

The manner of introducing an on-line user has £lready b€^en described 
(As far as SMART is concerned, a user and tlie console he is then using are 
equivaient Jn all ways. Thus, wherever the word 'console' appears, the 
word 'user' could be substituted.) 

The on-line control program consists of two logically distinct 
routines. CONSOL handles physical communications with the consoles on an 
interrupt basis (i.e. in real-time). CYCLE liandles the use of core and the 
large system files by cycling among them, satisfying users as it can. 

Logical control of each console shifts between CONSOL and CYCLE. 

The SMART On-3-ine Console Control Block (SOCCB) indicates at any 
given instant which routine is logically in command of a console. The 
SOCCB synchronizes the real-time routine CONSOL with the process-time 
routine CYCLE. The RLADY flag assooated with each console takes on certain 
values if the console is awaiting completion of a task done by CYCLE. When 
CYCLE is finished, the READY key is changed. Since the key is changed, 

CONSOL can recognize that it should proceed with that console. 

Testing READY flags (for up to 256 coiisoles) is accomplished by a 
single instruction (Translate and TEST —TRT) using a 256 byte array. Since 
the te^t is fast, it can be carried out frequently by both CONSOL and CYCLE. 
For example, after sending each line of a message to one console, CONSOL can 
test to see if any other console requires service for a single line. If so. 
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the servicing of the one console with a series of lines is termiiiated , and 
consoles with single-line needs are handled. CONSOL then returns to 
the multi-line message and finishes, CYCLE uses the speed of the TRT 
instruction to locate those queries needing a specific process. After 
each CYCLE driven routine finishes with a batch of queries, the table C' n 
be scanned to see If, in the meantime, any other queries now need that same 
process. Some routines which can be logically divided into two parts, one 
essentially in-core and the other necessitating file accessing, could be 
prograTTiined to check for "latecomers” to speed up overall response without 
losing the advantages of cycling. 

For a list of typical READY flags see Table 2, 

H) Timing Considerations 

In order for the type of organization presented to be acceptable 
to non-SMART users of the computer, two timing considerations are jaramount . 
First the CONSOL routine must be assigned highest priority by the supervisor, 
since it must respond to on-line signals. CYCLE is assigned the second 
highest priority. This implies that if CYCLE is free to perform work, the 
CPU is taken away from any other executing program (except CONSOL and the 
supervisor itself). Normally, however, CYCLE is I-O bound. While CYCLE is 
waiting for needed information from noncore resident files, and when CYCLE 
has no work to do, the CPU is able to do the work of ether customers. 

Thus, CONSOL must have available everything it needs to work and 
CYCLE must contain n£ wait loops of any size. If information is not available, 
the supervisor must be given control until the required information is 
available , 
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Kon-SMART l-rogr^/is 



t 



Supervisor 



CONSOL 



CYCLE 



f ^Console _i _] 




User Parc^TiCLer Vectors J 
-^^_User Histories J 




SMART On-Line 
Console Control Block 


TST 


Console 


User 


READY 




Number 


Vector 


Flag 


255 


45 


3408 


34 


255 


12 


3479 


34 


0 


0 


0 


0 


255 


1 


2202 


4 


0 


0 


0 


0 



Centroiel Searching j Centroid Concept 

Vectors 



Fre-search I)ispiay Vocabulary ^ 

Text_Cracking J 4 ■ Vocabularies ^ 

[ 

Docun;ent_3earching J 4-T Document Concept j 

I Vectors 

Post-search Display Text J 

[^"Query Update J 4 



DocuTient Concept 
Vectors 



...J 



Lege.jd: Core-resident 



Auxiliary 



H.R. Human Readable 



ERJC.i 



SMART On-line Control Logic 



Fig. 5 
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Routine 

Needed 


READY 


I'leaniTig 


;none> 


0 


Unused slot. 


CONSOIi 


1 


Newly arrived console, no assigned user vector. 


(NONE) 


2 


Console keyboard unlocked for user transmission. 


CONSOL 


3 


Console keyboard locked by receipt of a user 
transTTiission . 


(NONE) 


4 


One line message going to console. 


lJNSOL 


5 


Console keyboard locked further liiKs aie needed. 


CYCLE 


6 


Allocate core. 


CONSOL 


7 


Core Allocated. 


CYCLE 


20 


CraciC text . 


CYCLE 


21 


Cracking text. 


CONSOL 


22 


Texi cracked. 


CONSOL 


23 


Notifying user. 


CYCLE 


24 


Set“Up pre-'search display. 


CYCLE 


2S 


Setting-up pre-search display. 


CONSOL 


26 


Pre-search cisplay setup. 


CONSOL 


27 


Displaying to user. 


CYCLE 


40 


Search centroid tree. 


CYCLE 


41 


Searching centroid tree. 


CONSOL 


42 


Centroid tree searched. 


CONSOL 


43 


Informing user of results of tree search. 



READY Flags 




1-i 



Table 2 
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I) Noncore Resident Files 

Befor-e going into CONSOL and CYCLE in detail each of the files used by 
SMART is introduced briefly. The various logical segments cf core are then 
similarly defii^ed to provide a reference and to eliminate detailed descriptions 
within succeeding sections* 

SMART files can be divided into three distinct classes — those used 
by CYCLE, these used by CONSOL, and the consoles themselves. The console 
files are basically standard sequential files, with, however, an unpredictable 
access time. Like sequential files, records are read (or witten) one~at-a~time 
and in linear order. There is, of course, no backspacing, rereading or over- 
writing . 

CONSOL deals with three files of a moi’e familiar nature. The 
'SMART Statistics File^ is a sequential, write-only file on which is placed 
information to enable evaluation of SMART’S performance by supervisory 
staff. Information such as observed user and SMART response times, and 
statistics on query authors using the system might be kept* 

The 'User History File' retains information about unfinished queries 
on an individual user basis. For each user, such information as the number 
of queries he has submitted to the system, the number still active, and ac- 
counting information may be kept. For each active query, a record is kept 
of the text of the original query, and of the last active concept vector 
for that qu‘>ry. Perhaps, a list of additional documents, unseen by the user, 
should be kept to try to forestall a complete lack of positive feedback. In 
this manner costs could be kept reasonably low for a majority of users by not 
showing many documents except when necessary. One might also want to Keep 
soiJiii type of record of the searchc . c-entroid tree so that ’'obviously" unsuitable 
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tree nodes would not have to be reconsidered during relevance feedback. 

The 'User Parameter Vector File' contains user parameter vectors. 

^ach user can have several different parameter vectors (with distinct names) 
for different purposes. The only reason for s iparating this file from the 
previous file is that this file is essentially a read-only file, whereas 
the previous file is updated with every system access. It is anticipated 
that the directory for this file v^^ould be one part of the preceding file. 

The files used by CYCLE-called routines are of two distinct types -- 
human readable and machine readable. The human readable files contain 
information suitable for display to normal users at consoles. The other 
files are hoi%’ever organized for maximum speed of access and minimum space 
for storage of information used solely by SMART. A complete system must 
have human readable files — the vocabulary aid files and the source text 
files. Vocabulary aid files contain thesaurus expansions, hierarchies, 
frequency lists, etc. Source texts contain titles and abstracts of docu- 
ments in a form suitable for on-line display. Normally vocabulary aids are 
used prior to a search and texts alter a search. 

There are three machine-readable classes of files —vocabulary files, 
files of centroid concept vectors, and files of document concept vectors. 
Vocabulary files contain the information needed to quickly understand input 
text (i.e. to convert raw text into a standard concept vector). The 
other two files contain, respectively, files of centroid concept vectors 
and files of document concept vectors. The separation of centroid and 
document concept vecuors into two distinct files is dictated b/ the relative 
sizes of the two files. Commonly, a centroid has over 10 sons: thus a 

centroid tree for a iiJe of 100,000 document would contain less than 
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9,000 nodes r In most situations, the centroids could be accessed faster 
as a separate data set because of their smaller volume , 

There also exists a file which contains the programs called by CYCLE 
In order to further conserve space, it may be desirable that these mutually 
exclusive routines be overlayed during execution. 

J) Core Resident files 

Seven types of core resident files are used by SMRT . They have dif 
fering typical lifetimes, lengths, sources, and destinations. Because of 
their differing lifetimes, they are allocated from different pools of avail- 
able core. This minimizes a serious tendency to fragment core and eliminate 
a need for dynamic relocation of in-core files. By permitting the system 
to obtain variable amounts of core, SMART is able to work in 50 K or 500 K, 
albeit with grossly different response times and CPU utilization rates. 

The first file is the previously mentioned SMART On-line Console 
Control Block (SOCCB). This block is the key to the entire control cf the 
on-line system and is, therefore, described in detail in the next section. 
The size of the SOCCB is fixed when SMART is initiated by the number of 
consoles to be accepted on-line at one time. This block is retained 
until SMART goes off-line. 

Each user is assigned a user vector. This block is of fixed length 
and is retained as Icng as the user is on-line. Tne user vector contains 
pointers to the locations of dynamic fielrs ’^owned" by the given console. 
These fields include parameter vectors, buffers and correlation vectors. 

The user vector is accessed only by CONSOL and CYCLE. 

The parameter vectors contain values for Vcriable,; used to control 
the various routines. Each routine needs its own parameter vector. There 
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exists a standard default parameter for every routine, and these standard 
vectors are core-resident for the life of a given invocation of SMART, Any 
user vector can point to one of thesG default vectors; however, no user 
can change values in the default vectors. If a user wishes to change any 
values, space is allocated for his own individual parameter vector for each 
routine the user wishes to control in a non^standar'd fashion, A user may 
name his parameter vectors in order to i e-use them easily. An individual 
parameter vector is core-resident only for the duration of the process 
which that vector controls. 

Buffers contain a line or a track of information. They typically 
have a short lifetime, and the speice occupied by the buffers is reutilized 
at a high rate. Buffers to or from a single file can be linked while in- 
core- These vectors constitute the majority of core needed by SMART, In 
some cases, it may be desirable to keep some buffers in core in anticipa- 
tion of repeated use. If sufficient core is available, this can be done. 
However, this in-core saving of a buffer is unknown to all routines except 
to the buffer manager. This permits a routine to use 50 K of 500 K bytes 
without any internal knowledge. Only the response times to requests for a 
buffer will differ depending on the amount of core utilized. 

The concept vectors constitute the output of the routines converting 
text into concept vectors, and of the query update routines. These vectors 
ate much shorter than the text they represent, and they can be more easily 
utilized for search purposes. Only one concept vector per user need be 
kept in core and the concept vector supplants the buffers containing the 
original query. 

Specification and correlation vectors contain the names of individual 
centroids or of documents to be matched with a query, and later, the corre- 
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lations with those items. The life of these vectors is short but the core 
requirements for a single query can be determined only dynamically. 

Result vectors are shortened correlation vectors. They are used by 
CONSOL to pass information to the consoles. 

5. CONSOL — A Detailed Look 

Once the overall structu'^e of the proposed on-line system is under- 
stood and the contents of the various files in understood > a detailed expla- 
nation of the operation of the tvc major routines becomes straightforward. 



going into the routine itself, the SMART on-line console control block 
(SOCCB) is described: 

A) Competition for Core 

It is possible that one user may finish a line and the interrupt- 
called supervisor can start CONSOL, while a second user can finish his line 
before CONSOL finishes with the first user. The second us-r*s finish would 
cause the supervisor to start CONSOL again. A routine like CONSOL is called 
reentrant if sever«il different processes (users) can simultaneously execute 
it. On a single CPU machine like Cornell's 360/65 the simultaneity is apparent 
and due to interrupts. However, on a multiple CPU ""-achine the sumultaneity 
could be real. In both cases the problem is the same: no process can know 

if another process is also executing the same code. The requirement is that 
no 'Edition" of a reentrant routine can change core locations possibly known 
to another "edition” of that routine. If the reentrant routine must obtain 
additional core, the same probleni exists — two editions may try to take the 
same space. A similar problem arises between CONSOL aiid CYCLE: CYCLE could 



CONSOL will be considered first since it is first logically. Before 
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be claiming an area of core while at the same time CONSOL decide? to use 
that same area. 

In order to prevent destructive competition for ownership of resources, 
the 360 provides a single instruction which locks a resource as it tests 
that resource for availability. The instruction is called Test and Set 
(TST). Basically TST sets a byte non-zero and sets the condition code to 
zero or ncn-zero as the previous contents of the byte were zero or non- 
zero in one inseparable step. (The TST instruction is outlined in Fig. 

6 ). 



B) The SMART On line Console Control Block 

The SMART On-line Console Control Block (SOCCB) shown in Fig, 5 holds 
four items for each active user. The maximum number of consoles that can be 
on-line at one time is decided when SMART is first entered; MAXUSSRS contains 
this number. The fields marked TST and READY (in Fig, 5) are each vectors 
of ’*MAXUSERS" consecutive bytes. The TST field contains zero if that parti- 
cular line is unused. When a line is reserved for a particular console, the 
TST field is set non-zero. The Console Number field contains the super- 
visor number for a console and the User Vector field contains the location 
of the user vector fur that console. 



TST LOCK (Instruction) 


Before 
Execut ion 


After 
Execut ion 


Case 1 Location LOCK 


0 


255 


Condition Code 


- 


zero 


Case 2 Location LOCK 


255 


215 


Condition Code 


- 


r-on-zero 



The Test and Set Instruction (TST) 
Q ac Applied to the Location Named LOCK 
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Fig. 6 
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C) The READY Fla^ and the TRT Instruction 

The READY field contains one of 256 equivalent flags. Each flag (value) 
indicates what process is then needed by that user. Typical values are 
given in Table 2, To understa::d the value of the vector, one needs to under- 
stand the Translate and Test (TRT) instruction. This instruction considers 
two read-only vectors. The first vector is the vector of READY values; the 
second contains a table cf 256 bytes. This last table contains zero bytes 
except in those bytes whose address ^relative to the first byte of the table) 
is the same as a READY value which must be tested. The TRT instruciion takes 
bytes from the first vector, one-at-a-time , and looks at the table entry 
corresponding to the value of that byte. If the object byte is zero, the 
next READY value is considered; if the object byte is non-zero, the instruc- 
tion ceases rith that object byte and the location of the source byte is 
made available. If no byte stopped the instruction, that fact is so indi- 
cated. If the insti'ucticn is stopped by a non-zero object byte, the 
registers used by the instruction are left in a condition such that the 
instruction can be reexecuted for the remaining bytes in the source vector. 

A pictorial explanation of the TRT instruction is given in Appendix 1, 

For internal convenience, READY values are often assigned in blocks ~ 
each block associated wit>i a gi\'en process. Most processes can be divided 
into four phases: unconsidered by CYC.uE, being considered by CYCLE, uncon- 

sidei'ed by CONSOL, and being considered by CONSOL, Some f<EADY values 
appearing in Table 2 show this assignment, 

D) The Routines LATCH, CONSIN, and CONSOT 

When a person types "S.MART^' on a console, the supervisor transfers 
control to SMTLATCH. SKTLATCH interrogates the variable SMTOPEN. If SMTOPEN 

EMC 
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is 2ero, SMTMSG vContaining the appropriate message) is sent out to the 
calling console. If SMTOPEN is non-zero, control is transferred to (the 
location contained in) SMTOFEN, SMTLATCH, including SMTOPEN and SKTMSG, 
is always available to the supervisor as a standard supervisor process* 

Since SMTLATCH takes only 96 bytes, it can be kept constantly cor-e-resident . 

The first routine called when SMART is started in the standard 
manner is (ONLINE) which in;>erts the location of the routine LATCH at 
SMTOPEN. When SMART no longer wishes to accomrriodate new users, the 
routine OFFLINE updates SMTMSG to indicate the next scheduled time for 
on-line document retrieval; finally, SMTOPEN is set to zero. Consoles 
active in the system can still be accommodated in any suitable manner. 

When LATCH is called, an unused row is located in the SMART 
On-line Console Control Block (SOCCB) using the TST to insure that the 
selected row is indeed available. LATCH then changes READY for that row 
CO 1 (from 0) and stores the name of the console in the SOCCB, If CONSOL 
is running, LATCH simply returns to the supervisor (which will restai-t 
CONSOL where CONSOL was interrupted)* If CONSOL i-<< not running, LATCH 
causes the supervisor to mark CONSOL as runnable. LATCH then returns to 
the supervisor. The new console will be nored in due course by a TRV 
in CONSOL. 

Routine CONSIN is similar to LATCH; when a console is released to 
a user, the supervisor needs the name of a routine to call when the trans- 
mission from the user is comp].ett; as well as a place to put the ti^nsmission , 
CONSIN is that routine. The supervisor tells CONSIN the name of the console 
which interrupted; CONSIN then changes the READY flag for the console {from 
^ to 3 and insures that CONSOL is running. 

ERIC 
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fo minimizo over-all response times only one line will be se\ up 
for transmission to a console if another console also needs service. If a 
console needs several lines, but only one is transmitted, CONSOL will have 
to prepare other lines at a later time. To do this on an interrupt basis, 
routine CONSOT is called by the supervisor after transmission of a line to 
a console if that console requires more information. 

All of these routines consist of fewer than a hundred instructions 
and take less than a millisecond to execute. Fast response to the changes 
made in the READY table is insured, since CONSOL tests the flags after each 
line of a transmission is complete. The test for a console needing attention 
is less than fifteen microseconds if no console needs attention (assuming 
ten on-line consoles). Since the test is so fast, frequent repetition is 
not expensive. 

E) CONSOL as a Traffic Controller 

In basic terms, CONSOL uses the ’IRT instruction to select a console 
which has a need and then satisfies the needs of that console at least 
temporarily. CONSOL then uses the TRT again to select another console. 
Eventually all console needs will be satisfied and CONSOL will retire to 
permit other processes to use the CPU; one of these processes will most 
likely be CYCLE. When CYCLE has completed a request for a user, or a set 
of requests, CYCLE will ask the supervisor to restart CONSOL, and, by so 
doing, suspend itself in real-time (but not in process-time). Alternatively, 
the completion of a user line at a console will result in an interrupt- 
initiated call to CONSIN or LATCH which can uake-up CONSOL. Effectively 
then, CONSOL uses the TRT instruction to facilitate a traffic direction 
problem. 

O 
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SMTLATCH 



er|c 




SMTLATCH, LATCH, CONS IN, jnd CON SOT 
fig. 7 
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CONSCL 
Fig. 8 



It is apparent fi'om a s, in of various possioie needs tha: soine 
are inore urgent than others. Fo:’ CONSOLi, however, needs are satisfied so 
quickly that the arbitrary selection of the console highest in the SOCCB is 
adequate- CONSOL works so fast that even if the 256 users were on-line and 
ail had a need at the sane instant, and the first user were serviced fir,st, 
the last user would be satisfied before the transmission Co the first user 
was complete. In actuality, in most cases, only one user v;Lll need service 
at any given time. The obvious exception to tnis is after CYCLE completes 
a task — at that time, several consoles will need transmissions. It is 
immaterial, however, which console is satisfied first, since all consoles 
will be satisfied by COKSOL in much less tine that was taken by CYCLE. 

F) A Detailed View of CYCLE 

In contrast to CONSOL, CYCLE follows a strict pattern in deciding 
what to dc . Like CONGOL, CYCLE uses the IRT instruction but CYCLE decides 

what process to do first. Then it sees which consoles need that process. 

If no console needs that process, CYCLE tries the next process in its list 
of processes. To pei-mit on-line access to more than one collection for 
test purposes, or access by sophisticated users with special needs, each 
process is i-un for all consoles that request one collection and then for 
all consoles that require another collection. This is illustrated in 
ri,'’. 9. 

Some object processes stalled by CYCLE are standard programs 
used for batch experimentation; text cracking, centroid tree searching, 

iocunent coi relation, and query redefinition. The processes unique to the 

on-line system divide into two classes — those that access files for the 
and those that service CONSOL. There are nresently two programs of 
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CYCLE 
Fig, 9 
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the first type; to display pre-search information, e.g, thesaurus cate- 
gories, and to display post -search material, e.g. abstracts. Since COWSOL 
operates on an interrupt basis- it cannot allocate resources for itself. 
However, CONSOL dees need to be able to obtain core storage space on de- 
mand. lo provide this, CYCLE can be asked to allocate storage for a console 
and return control to CYCLE. 

From the flowchart for CYCLE sliown in Fig. 9, it can be seen that 
CYCLE restarts CONSOL without testing if CONSOL is running. This is 
possible since CYCLE can use the CPU only when CONSOL is inactive. 

6. Summary 

On-line information retrieval is implemented by two co-routines, 
CONSOL and CYCLE. The former operates in the real-time of the console user 
providing I'^apid response; the latter in the process-time inherent in any 
routine which needs to access auxiliary storage providing realistic costs 
for work done. The two routines corrimunicate through a single area of 
mutually known coi-e. 

This system should prove adequate for both experirrentation and teal- 
tlrae use in a library, for both the novice user and the sophisticated 
researcher with the complex problem. 
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Append i>c 



TRT READY, TABDEn (Instruction image, n=l, 2, 3 or 4l 


Location; 
Contents : 


READi + 0 i 3 4 5 6 7 6 

5434200 'j1 


Location : 
Contents : 


TABLEl +0123456 
0 0 0 6 0 0 0 


Location; 

Contents: 


TABLE2 +0123456 
0 0 0 0 9 0 0 


Location; 
Cont>mts : 


TABLE3 +0123456 
0 2 3 0 0 0 0 


Location: 
Contents : 


TA3LE4 +0123456 
0 B 0 0 0 0 0 




Execution : 


1st 


2nd 


3rd 


Register : 


0 1 cc 


0 1 cc 


cc 


n 

1 

2 

3 

4 


5 (READY)+2 1 
9 (READY)+1 1 
J (READY)+4 1 
S (READY )+6 2 


0 

9 (READY)+3 1 
2 (READY;+6 1 

i 


0 

0 

1 




(READY) means the address of READY ; cc = condition code 



The Effects of the 'translate and Test Instruction (TRT) 
I’hen the Vector FvEADY is Entered Against Several Tables 
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XV. Template Aralysis in a Conversational System 

S . F . Weiss 



Abstract 

This study presents a discussion of natural language 
conversational systems. The use of natural language rather than 
fixed format input in such a system makes possible the imple- 
mentation of a natural dialogue system, and renders the system 
available to a wide range of users. A set of goals for such 
a system is presented. These include the provision of fast 
responses, usable by all levels of users, and the use of intel- 
lectual aids such as tutorials. 

An experimental conversational system which meets these 
goals is impleiiiented using a template analysis process. Tem- 
plate analysis is used not only to analyze ratu;al language 
input, but also to control the overall operation of the process. 
Experiments with a number of users show ;hat the system is easy 
to utilize and provides accurate analyses. A detailed discus- 
sion of both user a J system performance is presented. 

1. Motivation 

Programs and data are normally entered into a computer in 
p batch processing mode. However, the recent trend in computer 
system design ha.s been toward the development of large time 
shared systems which give a number of users simultaneous on-line 
access to the computer. This makes possible the implementation 
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of conversational programs which permit real-time man-machine 
dialogues. Such conversational programs are both useful and 
necessary to cope with the ever expanding complexity of com- 
puterized data processing tasks. Consider for example, an o\i- 
line programming language such as APL. The ability to test 
amd debug a program on-line is an aid to the programmer. Errors 
are more easily located and may be corrected immediately. In 
addition, on-line data entry allows the programmer to adjust 
parameters and data while the program is running in order to 
get the desired results. 

Conversational programs are also useful in all forms of 
language processing and expecially in information retrieval. 
Consider for example a case in which a natural language analysis 
program encounters an uni esolv able ambiguity. In the batch 
mode, the program would be forced either to give up or to 
proceed using the multiple interpretations. But in a conver- 
sational mode, the system can ask the user foi' clarification 
and then proceed with perfect information as is shown in the 
example in Fig. 1. 



U: TYPE 2 GRAMMARS 

S: YOU HAVE USED T YPE AMBIGUOUSLY. PLEASE SPECIFY: 

A. PRINTING 

B. VARIETY 

U: B 

S; PROCEED 



User Disambiguation 
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Fig . 1 



ERIC 




XV-3 



In information retrieval the applicability of conversational 
programs is very broad. It is the only way to make the retrieval 
operation fast enough for practical use. In addition it permits 
the urer to see results immediately and adjust his query and 
other search parameters to tailor the performance to his exact 
needs. The conversational mode is also the best framework in 
which to implement the relevance feedback process [11,24]. 

In general the conversational facility is an extremely powerful 
information retrieval tool. 

Section 2 of this study discusses some existing on-line 
systems. Most of them require a fixed format input. But the 
* current trend in information processing is toward natural 
language input. Not only does this permit the treatment of 
documents and queries in their original form, but it also makes 
the on-line facility available to a broad spectrum of potential 
users. This is especially important since on-line systems 
permit remote access from places such as libraries and schools 
which are not inhabited strictly by computer people. This 
study discusses conversational systems in general and presents 
a natural language facility for information retrieval. 

There are four basic goals which any such natural language 
conversational system should meet. First, the system obviously 
must accept natural language input. Second, it must provide 
fast response. Users tend to become impatient if the delay 
between the submission of a command and the system's response 
exceeds more than a few seconds. Third, the system should be 
usable by all levels of users. Inexperienced users should be 
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able to perform useful work» At the sane time the system must 
not hamper the expert with excessive verbosity and unwanted 
material. And finally, the system should provide some intellec- 
tual aids such as tutorials and prompts which can help the user 
conduct a useful dialog ue. 

2. Some Existing Conversational Systems 

Many conversational systems are currently in operation. 

Most are part of a larger implementation such as an information 
retrieval system. But a few such as ELIZA are designed solely 
to perform conversation. The major differences among the conver- 
sational aspects of the various systems is in the amount of man- 
machine interaction permitted. In some systems the on-line 
input is not far removed from batch input and the user has little 
control over the running of the process. At the other extreme 
are systems in which the user is directly linked to the process 
and is continuously in command of program operation. The dis- 
cussion of on-line systems presented below is roughly in order 
cf increasing complexity of dialogue. 

The most basic type of con\^ersation consists of a simple 
user input which results in some appropriate system action being 
performed. RECON [16], DIALOG II [29], TIP [15], and AUTONOTE 
[22] are representative of this type of conversation. In 
RECON for example, the user presses a button which indicates 
the desired operation and then types the operands on the con- 
sole. In the other systems the user types the operator name 
followed by operands. Thus all these processes require a fixed 
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input forma:. In addition, should the user become lost or 
confused, the systems cannot supply any intellectual fid to help 
him out. One type of user aid, the tutorial, is a feature of 
the AUDACIOUS system [2], In addition to the. normal operator- 
operand commands like those above, AUDACIOUS permits two special 
commands'. HELP and PUNT, In response to these, the system 
produces a tutorial message appropriate to the user*s position 
in the dialogue. In this way ihe confused user can receive help, 

A second type of intellectual aid is the prompt, SPIRES 
[21] is an example of a system which uses the prompting feature. 
Unlike tutorials, prompts are presented without user request. 
Their purpose is to indicate to the user what type of infor- 
mation is to be Specified in the current input. However, since 
prompts are presented without a user request, they can sometimes 
be a nuisance to the expert user. All the conversational systems 
presented thus far share two attributes. First, they all require 
fixed form input. And second, they are all Information retrieval 
systems and hence the conversational operation was not the prime 
consideration in their development. The systems discussed 
in the next few paragraphs are designed basically to conduct 
conversation in natuial language. 

Probably the most famous natural language conversational 
system in We i z enbaum ' s ELIZA {3^]. The program conducts a 
coherent dialogue with the user much like that between a psy- 
chotherapist and his patient. Inputs are searched for the 
presence of certain keywords and structures. These indicate 
the type of output appropriate to the input. For each input 
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form there is more than one allowable response. ELIZA cycles 
through this set thus eliminating repetition and producing a 
more realistic looking conversation. The approach to conver- 
sation used in the system presented later in this study is similar 
to the ELIZA concept. 

Another area of usefulness for conversational capabilities 
is in computer assisted instruction. One such conversational 
CAI system is Colt's Socratic I \ist ruction [6]. Its operation 
is basically an extension of the techniques outlined for ELIZA. 
Like ELIZA, the Socratic Instructor uses the user position in 
the dialogue along with the input to determine the proper res- 
ponse, In addition, the Socratic Instr\ictor remembers all pre- 
vious user inputs and dialogue points. These are also used in 
output de t ei mina t ion . 

Most conversational systems in existence today are imple- 
mented by basically ad hoc programming methods. This is not 
unusual for a fairly new area such as conversational programs. 
However, as on-line systems become more common, liigher level 
Implementation processes must be developed. One such process 
already in existence is the LYRIC system developed by Silvern 
(26], This is a programming language for describing conversa- 
tional CAI programs. With processes such as this, the con- 
versational implementor is relieved of some of the ugly ptogram- 
iring details in much the same way as a compiler-compiler aids 
the systems programmer. 

The conversational systems presented here by no means consti- 
tute the complete set. They are, however, representat.’ve of 
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most systems. It appears that systems such as TIP and SPIRES 
vhich perform efficient on-line information retrieval requi^^e 
highly structured input format. On the other hand those such 
as ELIZA which permit natural language inpu^ have a very weak 
concejt of understanding. it would be desirable to develop 
a system which combines the best attributes of both; that is, 
a fast and accurate Information system which allows natural 
language input. This is the topic of the following sections. 

3. Goals for a Proposed Conversational System 

This section dis "busses the design considerations that go 
into the development of a new conversational information re- 
trieval system. Some elements of the new system are drawn 
from existing facilities while others are new. The primary 
goal of this system is to allow a user to conduct a natural 
language dialogue with the system. The only limitation is that 
the input be restricted to an information retrieval context. 

Kot only should the user be allowed to specify natural language 
commands, but also there should be no restriction on the number 
of commands per line as th.ere are in most other conversational 
systems. An input such as 

USE THE COSINE CORRELATION ON THE CR/.NFIELD 

COLLECTION . 

should be perfectly legal. Of course there may be sorie inputs 
for which natural language is impossible or impractical and a 
fixed format input must be used. For example, the user sho;.ld 
Q ! required to specify a fixed form ’’SIGNOFF” in order to 
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prevent accidental ternination of the conversation* But these 
formatted inputs should be kep ; to a minlmuTn. Another goal for 
this system is to be able to ref^olve automatically ambiguities 
occurring in the user’s input. In addition the system must meet 
the requirements specified in sectlcn 1. These include providing 
fast response, being usable by all levels of users, and providing 
intellectual aids such s tutorials and prompting. 

This proposal makes demands on the user as well as the 
system. First, the basis fov learning the system is a manual. 

It would be aesthetically pleasing to allow the system itself 
to contain a computer aided instruction facility (CAI) which 
would make the system completely self-contained. Unfortunately 
this is impractical. Successful CAI requires concentrated and 
frequent exposure to the teaching medium. It appears that the 
typical informaticn retrieval user dialogue will be both brief 
and fairly infrequent. Also, trying to teach the user at the 
console unnecessarily ties up the facilities, TJius an off- 
line approach to learning the system seems more reasonable. 

While no CAI facility Is provided, the system should offer a 
prompting option by which a user can be led step by step, through 
a simple retrieval process. In this way the user may learn 
something ^bout the system while actually performing useful 
retrieval work. The user’s ir.anual for this system is divided 
into several sections. Each deals with system use in progres- 
sively greater detail. A user need only read those parts which 
satisfy his particular need. A casual user who wants only simple 
retrieval operations using system defaults, has to read only a 
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fev/ pages. And the prompting facility can je used with only 
aparagr a phorsoof instruction. 

Another user problem that must be treated is the separation 
of novices and experts, As is often the case, conversational 
systems are handled by users with widely varying degrees of 
expertness. The system should neither hamper the expert with 
excessive verbosity nor hinder the novice with obscure and terse 
responses. Some systems compromise and use a "middle of the 
’"oad" approach, but this satisfies no one. Other systems have 
multiple sets of dialogue scripts. A user is classified as having 
a particuir level of proficiency and he receives the dialogue 
appropriate to that level, 3ut this too can lead to problems. 

In any large facility such as an information retrieval system. 

It is entirely possible for a user to be very proficient in 
some but not all areas of the system. Classifying him strictly 
as a novice or exp^.w is wrong in both cases. To solve this 
problem, the proposed system uses an implicit rather than 
explicit separation of novice and expert. Thic is accomplished 
by allow! r.g access to options only when the user asks for them. 
Thus tha more the user knows about the system, the more faci- 
lities he has at his disposal. The novice is thereby protected 
from options which ha doe'^ not understand. Tutorials are also 
presnted only on request. Because of this only a single se^ 
of tutorials is needed and they can be reasonably long and clear. 
The expert user v’ho does not ask for a tutorial need never see 
any and thus is not hindered by them. The only manifestation 
of the novice facilities that an expert must see is the short 
i— I s 1 1 o n • 
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Do you need help in using the sysceni? 



This appears immediately after signon. hven this can be eli- 
minated by illowing a user status file to be stored between system 
uses. Upon signing on» the user's: status file is read and appro- 
priate parametersi including his negative answer to the above 
question, are set. 

A few other characteristics of the proposed system also 
help in the proper handling of both novi.ce and expert users. 

These are the multi-step processing technique and the ability 
to compound ccinmands on a single line. An expert, for example, 
can put several system commands into a single input thus saving 
time and effort. The same commands may also be split on a number 
of lines for greater clarity. This and the multi-step process 
are discussed in greater detail in section 

One final goal of the proposed system if ^ ^ ontation 
of useful tutorials. These messages must be easiJy available 
so tha* ex'en the most confused user can get help. One simple 
method is to use a single question mark as the tutorial 

request. The tutorials must reflect the specific place in the 
dialogue where they are called. In addition, they must take 
into consideration the commands and options that the user has 
already specified. Tutorials are also useful in treating errors. 
When an erroneous input is detected^ the system automatically 
produces a tutorial appropriate to the place where the error 
occurs. The incorrect input is an implicit indication that the 
user needs help and thus the tutorial is appropriate at that 
point . 
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The design considerations presented in this section are 
basically nontechnical. They stem from an effort to satisfy 
within practical limits the basic conversational needs of the 
largest possible user population. The next section presents 
a discussion of the actual implementation cf such a system. 

6, Implementation of the Conversational System 

This section discusses the implementation of the conver- 
sational system. The major obstacle in the process is the fact 
that the Cornell University Computing Center has a: pieset.t, 
no facilities for user implementation of on-line systems. 

The programs thus must all be run in the conventional manner 
with batched input. This poses no real problem in the design 
and operation of the system except in the area of testing it on 
real users. But ev^n this can be circumvented with adequate 
s imula t ion . 

A) Capabilities 

The conv.**rsa t ional system is designed to perform SMART- 
like information retrieval operations. The capabilitiec built 
into the present system include specification of a correlation 
coefficient, search strategy and ollecticn to be used. The 
first two of these are provided with default values that art? 
used if not ting is explicitly specified by the user. There is 
provision for submitting a query containing a number of data 
base entry point references (subject, date» Journal, and author). 
A search can be initiated and the user can request to see any 
ntmher of retrieved documents. In additicn to ,"hese information 
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retrieval operations, the user has available some conmands to 
tlwi conversation.'?! system itself. These include requesting a 
tutorial, asking to be guided through' a retrieval operation, 
and signing on or off. A few other information retrieval opera- 
tions, most notably relevance feedback, are deliberately omitted, 
since the system is designed to test the conversational and 
natural language capabilities, and not to retest the informa- 
tion retrieval techniques. The set of capabilities is selected 
as typical of the inputs, outputs and internal processes 
required in a larger system. Also relevance feedback is not 
conductive to handling in natural language. While a user might 
introduce a natural language input which indicates his desire 
to perform relevance feedback, the actual submission of rele- 
vancy Judgements is best handled in a fixed format. Relevance 
feedback and a few other capabilities would add little to the 
significance of system experimentation and hence are omitted. 

P) Input Conventions 

While It is the aim of this system to allow natural lan- 
guage input, there are a few places where the use of natural 
language is impractical. This is usually caused by the physical 
characteristics of the conversational system or information 
retrieval in general. One such instance is in setting off a 
query from .ther types of input. A query may deal with any 
subject area. For example it could ask for information about 
some aspect of a conversational system. It could thus be indis- 
tinguishable from a legal system command. For this I'eason, 
the user rather tha.n the system, must perform the discrimination 

O 
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betvween queries and commands. This is accomplished by simply 
prefacing each query with *‘QUERV'^ or "Q” . This adds little to 
user effort and eliminates what might be an impossible system 
task. Another area where fixed format, is necessary is in 
search initiation. Unlike other operations in a conversational 
system which require only a few computer cycles, the search is 
relatively costly in computer time. It is therefore desirable 
to avoid uncalled for searches. Also, searches should not be 
initiated until the user is satisfied with his query and search 
specifications. For these reasons, searches are performed only 
upon an explicit signal ("GOSEARCH”) from the user. A third 
fixed format input Is the request for a tutorial. This is 
accomplished by typing a single question mark ("?^'), This is 
done strictly for user convenience. In this way, even the 
most confused user can receive a message appropriate to his 
present dialogue position. Tutorials are also automatically 
generated when a user introduces an incorrect input. The final 
fixed form input is the SIGNON command. In an actual on-line 
implementation, it is quite possible that this command will 
be handled by a supervisor program which controls all t>n-line 
operations. Thus the natural language analysis facility may not 
be present to process this input. The remainder of the inputs 
nay be posed in natural English. 

C) The Structure of the Process 

The structure of the conversational system may be viewed 
as a graph. Ihe nodes represent user decisi.on points and the 
'"dges represent possible alternatives and system actions. As 
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the user progresses through his dialogue, he moves from node to 
node in the graph. The action is much like that of a finite 
automaton. At every point in the dialogue, the user is at some 
system node. The combination of this current node and the user^ 
input at that point determine the action to be performed (ana- 
logous to the output cl the automaton) and the node \io which 
control is passed after the action is completed. This strategy 
allows the system to be throught of as a set of modular units. 
Each unit corresponds to a node and each has assoc j.ated with it 
the subset of inputs that are legaD at that point, as well 
as the associated actions. The input processing is thus greatly 
simplified since at each node the system need only test for 
those inputs that are legal. All other inputs are Illegal even 
though they might be acceptable at some other point in the dia- 
logue, The modular approach also facilitates some degree of 
disambiguation, Some inputs are ambiguous when considered with 
respect to the total set of system inputs, However, many become 
unambiguous within the context of a single node. The simplest 
example is the tutorial request ('^?"), The question mark by 
itself is not enough to determine which of the many tutorials 
is desired. But the combination of the question mark and the 
current node perforins the disambiguation and the proper mes- 
sage is presented, 

D) Template Analysis in the Conversational System 

There are two main jobs to be performed in a natural lan- 
guage conversational system. The first is the natural language 
analysis required to transform the input to a machine-usable 
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form* T!ie second job is bookkeeping. The system must keep 
track of the user's present position in the dialogue, the 
legal inputs as well as the successor node associated with 
each input. It seems relatively clear that the template analysis 
process introduced by Weiss [31] is sufficient to handle the 
natural language analysis task. The expected input consists of 
queries and system commands coming from some sort of on-line 
terminal. They thus conform exactly to the user restricted 
input for which template analysis is designed. While more 
complex systems would produce a more rigorous analysis of the 
input, template analysis can provide all the information that 
is needed from the input and at a considerable sav^ing in time 
over other methods. Thus template analysis appears to be the 
ideal natural language anlaysis technique for this application. 

Upon first analysis the bookkeeping task seems outside the 
realm of template analysis. But actually, th° most efficient 
way in which to implement this task is to inbed it within the 
template analysis structure. This is done as follows. Each 
template is applicable to only one node, which is called its 
bos t node. This is indicated by appending the host node number 
to the template concept numbers. Since template concept numbers 
range from 11 to 999, this appending can be accomplished by 
adding the desired node number time^ 1000 to the concept number. 
Each template contains a set of concept numbers, a key word, 
and a link to an action routine that is to be executed if that 
template Is matched. Some additional information must be added 
for the conversational application. Each template must contain 
a^ne xt node! immediate (NNl) number which tells the node to 
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which control is to be transferred immediately after execution 
of the associated template a^jtion. It is sometimes useful to 
defer transferring to a new node until all possible executions 
of the template action have been ijerformed. For example in 
cases where a number of similar pieces of information must 
be picked up from one input. In this case, NNI refers to the 
host node, A second value, the next node; final (NNF) then 
indicates the node to which control is transferred after all 
actions at the current node are complete. In the examples in 
Fig, 2 below, template A and B are both applicable only in node 
5, and both match the same input substring. After matching, 
however, template A calls action routine 51, and control is 
then immediately transferred to node 2, Template R causes 
action 55 to be performed and control remains at node 5, 
Finally, after all possible node 5 matches have been processed, 
control passes to node 3, In cases such as A where NNI causes 
a transfer to a node other than its own, the NNF value is 
ignored , 





NNI 


NNF 


ACTION 


TEMPLATE CONCEPTS 


A 


2 


- 


51 


5011, 5012, 


5013 


B 


5 


3 


55 


5011, 5012, 


5013 



Sample Conversational Templates 
Fig, 2 

In order to match the proper templates, the input must be 
made to reflect the current node (CNODE) in the dialogue. Upon 
^lading an input, the current node times 1000 is added onto each 
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input concept. Also, after every node change, the old node 
nu -Tiber is stripped off the input and the new node times 1000 
added on. Thus the input reflects the current node in exactly 
the same way in which the templates reflect their host nodes 
and hence proper matching occurs. In this way the template 
process itself keeps track of the current node, the legal inputs 
for each node and the successor node function. This operation 
is summarized in the schematic in Fig. 3. An input is read 
and each word is assigned a numeric concept by a dictionary 
lookup. The input is then set to reflect the current node i . 

A scan is made of the entire template set in search of a natch. 
However, only those templates vhose host node is 1 have any 
chance of matching. If a match in this subset i^ found, the 
associated action is performed and the next node path is fol- 
lowed . 

Fig. 4 indicates the node structure of the conversational 
system. Node 2 is the supervisor. After the initial signon 
phase, operations generally otart and end in node 2. Most 
operations are two step processes. First, in node 2, the input 
is analyzed and the type of operation that 5t specifies is 
determined. Control then passes to the appropriate new node. 
Second, in this new node, the exact operation is determined and 
executed. Control is then retrrned to node 2. As an example 
consider the input 

USE THE COSINE CORRELATION. 

In node 2, it Is determined that a correlation is to b‘ speci- 
O led and control pass'^s to node 12. In node 12, the specific 
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T=TEMPLATE 
A==ACTION 
N=NEXT NODE 



Srhematic of Conversational Operation 
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Figure 3 
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Convorsational Uodc Structure 




Figure A 
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correlation coefficient (i*e. cosine) is detected and noted* 
Control then goes back to node 2. 

There is no necessity that the commands for two step opera-- 
tions appear on the same input line. For example, simply 
typing ‘*CORHELATI ON ” causes a transfer of control from node 2 
to 12. The system then waits in 12 for further instructions* 
Strictly for the sake of convenience a special ieature is used 
in cases like this. Whenever the system finds itself waiting 
in a node other than 2 it knows that an incomplete input has 
been entered. A special routine is therefore called to print 
a message appropriate to the current node* This aids the user 
in completing the input as is shown below* In this example 
and in all other samples of conversational scripts, user input 
is identified by a leading "U:". 

U: CORRELATION 

SPECIFY A CORRELATION 
U: COSINE 

Not only can inputs he spread out over several line a’, 
several inputs can also be compounded onto a single line. 

For example 

Ui PERFORM A FULL SEARCH ON THE PHYSICS COLLECTION 
WITH THE COSINE CORRELATION. 

As is seen in the detailed flow chart in Fig* 5, once sn Input 
is read, it is processed repeatedly until all valid template 
matches are exhausted. This results in an exit from box 6 via 
O allure. Since this same exit Is taken regardless of how many 
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Conversational Control Algorithm 
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9 



12 . 



\ 


\ 


13. 

PRINT 

ERROR 

MESSAGE 










DID ThE INPUT 
HAVE AT LEAST 
ONE MATCH? 







15. IS 
GUIDE 
ON? 




14. 

FORCE 

ri p II 

INPUT 



18. 



FORCE GUIDE 
INPUT # GCOmi 



19, 

GCOUNT=GCOUNT+l 






17. 



CN0DE=NNFIN 



NOTE: NNPIN is the next node; final value. It is initialized to -1 

before teniplate matching begins. If no template matches are found# 
it will still be -1 at box 16, This indicates that control ic to 
remain at the current node. 

Conversational control Algorithm 
Figure 5 (Cond.) 
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or few, template matches occur in the input, a test must be 
made to see if at least one match occurs (box 12). If not, 
the input is not valid and a diagnostic must be pres<2nted to 
the user. The system prints a short general error message, 
erases the current input and replaces it by a question uark. 
Control is then passed back to the input analysis section. 

This results in the appropriate tutorial being shown to the 
user. This process of supplying diagnostics by al^’owing the 
system to force in a special input r.nd then treating this as a 
normal user input is also used in the implementation of the 
guide facility which is discussed below. 

E) The Guidi Facility 

In the original proposal for this system, a a sire is 
expressed to provide a prompting fa ility to guide a novice 
user, step by step, trough an actual retrieval operation. 

When a user signs onto this conversation system, he receives 
a brief introductory message: 

Do you need help in using this system? 



If the user is ramiliar with the system he can simply answer 
NO ano ha sees no more of the prompting script. If his answer 
ifc YES, he receives a lomewhat longer introduction to the 
system (See Fig. 6) and is then asked if he wishes to be guided 
through a retrieval operation. If not, the system operates 
normally and no prompting is given. If on the other hand, his 
answer to the second question is YES, the guide facility is 
Q rned on. The guide subroutine has a set of speclnl strings 
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<genaral operatlon> ? 



These Include for example; 

CORRELATION V 
SEARCH ? 

QUERY ? 
etc. 

Each time the guide subroutine Is called (see Fig. 5, boxes 
10 and 19) a orces Its 1th string into the input area, 
increases 1 by one, and transfers control back to t'ue input 
analyzer. These special inputs have the effect of performing 
the first half of a two step operation and then generating a 
tutorial. All the ueer has to do is respond In turn to each 
tutorial thus completing the second half of tho two step pro- 
cess. When ti'e guided retrieval process is finished, i is 
reset to one and the user is asked if he wants to be guided 
again . 

F) Tutorials 

There is a tutorial arsneiated with each system node. 

When the use. types a question nark, he is giver the tutorial 
appropriate to his current r.ode. The tutorials for all nodes 
except 2 provide instruction on the specific type of input 
expected. Unlike othet nodes which have a very limited legal 
input set, almost all options are available from node 2. A 
different and more detailed form of tutorial message is neces- 
sitated in this case* The rode 2 tutorial consists of two 
arts; the present status 4»nd the available options. The 
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status report provides a s uimn rry of the specifications that the 
user has already made. The available options are prasented 
as a list of tasks that are currently legal. Each option in 
the list has an identifying letter so that the user may pick it 
simply by typing the letter. 

Fig. 6 shows some actual scripts produced by the conver- 
sational system with various levels of users ranging from 
novices who use the guide facility (Fig, 6A) to highly know- 
ledgeable experts. The scripts Include both correct and in- 
correct inputs as well as the various tutorial forms. The 
program which performs the conversation is written in G-level 
FORTRAN-IV. It consists of about 1100 FORTRAN statements and 
includes 35 subioutlnes and entry points. The program uses two 
output streams. One Is used for diagnostic output. This is 
useful in debugging and in determining If the program's Internal 
operations are working properly. The second stream is the 
versatlonal output which would be displayed on the user console. 
Ti aing statistics for the system are presented In section 5. 

5. Experimentation 

The experimentation tests the system with actual users and 
analyzes the results both with respect to system performance and 
user performance. The user population used represents a fairly 
broad spectrum of poteritial conversational system users with 
varying degrees of knowledge of information retrieval. Ihe 
experiment for each user consists of two parts. Firsti the user 
is given a brief Introduction to the system. Basically le is 
Q Id that this is a conversational in ormation retrieval system 
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U: SIGNON 

SMART ON-LINE .. .DO YOU NEED HELP IN USING THE SYSTEM? 

U: YES 

THIS IS A NATURAL LANGUAGE CONVERSATIONAL 
INFORMATION RETRIEVAL SYSTEM. THE SYSTEM RETRIEVES 
DOCUMENTS IN RESPONSE TO ENGLISH QUESTIONS. YOU MUST SELECT 
A DATA COLLECTION TO BE SEARCHED. IN ADDITION YOU CAN 
SELECT SEARCH STRATEGIES AND CORRELATION OR ALLOW SYSTEM 
DEFAULTS. IF YOU NEED HELP AT ANY TIME IN YOUR DIALOGUE, 
TYPE "?" AND YOU WILL RECEIVE A TUTORIAL APPROPRIATE TO YOUR 
PRESENT POSITION IN THE DIALOGUE. FOR FULL INFORMATION 
CONSULT THE MANUAL. 

DO YOU WISH TO BE GUIDED THROUGH A RETRIEVAL PROCESS? 

U! YES 

THE FOLLOWING COLLECTIONS ARE AVAILABLE FOR SEARCH 
(SELECT BY NAME OR LETTER) . 

A. ENGINEERING 

B. BIOLOGY 

C. LITERATURE 

D. CHEMISTRY 

E. PHYSICS 

U ! A 

THE FOLLOWING CORRELATIONS ARE AVAILABLE. CONSULT 
MANUAL FOR FORMULAS. 

A. COSINE (DEFAULT) 

B. OVERLAP 

U: COSINE 

THE FOLLOWING SEARCH STRATEGIES ARE AVAILABLE. 

A. FULL SEARCH (CONSULT MANUAL BEFORE USING) 

B. CLUSTERED (DEFAULT) 

U: CLUSTER 

YOU MAY NOW ENTER YOUR QUESTION. IT CAN BE IN NORMAL 
ENGLISH AND MAY CONTAIN SPECIFICATIONS FOR DATES (YEAR), 
JOURNAL, AND AUTHOR. 

1J5 INFORMATION ON ORDERING OF RECURSIVE FUNCTIONS BY THE 
Ul DIFFICULTY OF COMPUTATION 

YOU MAY NOW INITIATE A SEARCH BY TYPING GOSEARCH 
Ui GOSEARCH 

SEARCH UNDER WAY, EXPECT DELAY 
SEARCH COMPLETE 



CONVERSATIONAL DIALOGUE 
Fig, 6A 
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INDICATE HOW MANY DOCUMENTS YOU WANT TO SEE. 
IF NON TYPE "NONE". 



U; GIVE ME 5 












RANK 


1 DOC 




91 


CORR 


= 


0.9896705 


RANK 


2 DOC 




11 


CORK 


= 


0.9791623 


RANK 


3 DOC 




42 


CORR 


= 


0,8563247 


RANK 


A DOC 




40 


CORR 


= 


0,8002142 


RANK 


5 DOC 




45 


CORR 


= 


0,7656324 


DO YOU 


WANT TO 


SEE 


MORE? 






U; YES 5 MORE 












RANK 


6 DOC 




98 


CORR 




0.6665342 


RANK 


7 DOC 




25 


CORR 


= 


0,5542100 


RANK 


8 DOC 




32 


CORR 


ts 


0,4022313 


RANK 


9 DOC 




89 


CORR 


= 


0.3987652 


RANK 


10 DOC 




41 


CORR 


s: 


0.3654201 


DO YOU 


WANT TO 


SE2 


MORE? 






U: NO 


THANK YOU 










SMART 


ON-LINE, . 


. ,DO 


YOU 


i NEED 




HELP IN USING THE SYSTEM? 



U: NO 

U: ? 

PRESENT STATUS: 

SEARCH ENGINEERING COLLECTION 
COSINE CORRELATION 
CLUSTERED SEARCH 

NO QUERY HAS BEEN SPECIFIED YET 



AT THIS POINT YOU MAY : 

A. SPECIFY A DOCUMENT COLLECTION 

B. SPECIFY A SEARCH ALGORITHM 

C. SPECIFY A CORRELATION COEFFICIENT 

D. SUBMIT A QUERY 

F. LOOK AT SOME RETRIEVED DOCUMENTS 

G. START OVER 

H. SIGNOFF 

U: H. SIGNOFF 

SIGNOFF ACCEPTED 



CONVERSATIONAL DIALOGUE 
Fig. 6A 
(con£ ' d ) 
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U: SIGNON 

SMART ON-LINE... DO YOU NEED HELP IN USING THE SYSTEM? 

U : NO 

U: YES 

INPROPER INPUT 
HELP FOLLOWS 
PRESENT status: 

SEARCH PHYSICS COLLECTION 
COSINE CORRELATION 
FULL SEARCH 

NO QUERY HAS BEEN SPECIFIED YET 

AT THIS POINT YOU MAY 

A. SPECIFY A DOCUMENT COLLECTION 

B. SPECIFY A SEARCH ALGORITHM 

C. SPECIFY A CORRELATION COEFFICIENT 

D. SUBMIT A QUERY 

F. LOOK AT SOME RETRIEVED DOCUMENTS 

G. START OVER 

H. SIGNOFF 

U: Q. 1970 LASER ARTICLES 

QUERY SUMMARY: DATE = 1970 

U: 4 

YOU CANNOT SEE DOCUMENTS UNTIL A SEARCH HAS BEEN PERFORMED 
COMMAND IGNORED 
Vi GOSEARCH 

SE4RCH UNDER WAY, EXPECT DELAY 
SEARCH COMPLETE 
U; 4 

RANK 1 DOC 56 CORR ■= 0.9828174 

RANK 2 DOC 5 CORR = 0.9620456 

RANK 3 DOC Cl CORR “ 0.802? "'64 

RANK 4 DOC 21 CORR - 0.6553271 

DO YOU WANT TO SEE MORE? 

U: NO 

U: SIGNOFF 

S1GN07F ACCEPTED 



CONVERSATIONAL DIALOGUE 
Fig. 68 
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U: SIGNON 

SMART ON-LINE... DO YOU NEED HELP USING THE SYSTEM? 

U : NO 

U: ? 

PRESENT STATUS: 

NO COLLECTION SPECIFIED YET 
COSINE CORRELATION (BY DEFAULT) 

CLUSTERED SEARCH (BY DEFAULT) 

NO QUERY HAS BEEN SPECIFIED YET 

AT THIS POINT YOU MAY: 

A. SPECIFY A DOCUMENT COLLECTION 

B. SPECIFY A SEARCH ALGORITHM 

C. SPECIFY A CORRELATION COEFFICIENT 

D. SUBMIT A QUERY 
G. START OVER 

SIGNOFF 

U: CORRELATION 

PLEASE SPECIFY A CORRELATION 

U : ? 

THE FOLLOWING CORRELATIONS ARE AVAIABLE FOR SEARCH. 
CONSULT MANUAL FOR FORMULAS. 

A. COSINE (DEFAULT) 

B. OVERLAP 

PLEASE SPECIFY A CORRELATION 
U : B 

U: FULL SEARCH OF THE PHYSICS COLLECTION 

U; Q; MU MESON PARTICLES 
U: GOSEARCH 

SEARCH UNDERWAY, EXPECT DELAY 

SEARCH COMPLETE 

U: I WANT 5 DOCUMENTS 



RANK 


1 


DOC 


53 


CORR 


s 


0.86547531 


RANK 


2 


DOC 


41 


CORR 


s 


0.7024357 


RANK 


3 


DOC 


49 


CORR 


« 


0.6654654 


RANK 


k 


DOC 


9 


CORR 


m 


0.5557575 


RANK 


5 


DOC 


17 


CORR 


« 


0.4302142 


DO YOU 


WANT 


TO 


SEE MORE? 







U: NO 

U: SIGNOFF 

SIGNOFF ACCEPTED 



CONVERSATION.'.L DIALOGUE 
Fig. 6C 
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and that he must type ’^SIGNON" to begin. Fvom then on, the user 
Is on his own. The intent here is to see if the uninitiated 
user elects the guide option and if sc, is the user successfully 
able to complete a retrieval operation ueing the guide facility? 
In the second experimental phase> the user tries to be more of 
an expert. Using information he has learned during the guided 
operation and some additional instruction, the user performs 
a second retrieval operation. This second operation is done 
without the aid of the guide facility. The sample scripts in 
Figure 6 are the actual results of these experiments with a few 
of the users. Results must be ai.alyzed with respect to both 
system and user performance. For the most part, system per- 
formance can be measured objectively while user performance is 
more sub j ec t Ive . 

A) System Performance 

The basic measure of system performance is simply how many 
inputs are handled correctly o'\t of the total number seen. 

This can be divided up since inputs arrive from several sources. 
Most inputs come directly from the user, but some are forced 
into the input area by the system Itself. An input nay be legal 
or illegal. Most illegal inputs are requests for options not 
accessible at th^ current node. If it is legal, a correct 
analysis is produced if the system performs the action intended 
by the user. For an illegal input, a correct analysis takes 
the form of noting the error and printing an appropriate mes- 
sage. Figure 7 shows for each input type, the total number 
nputs, and the number analysed correctly and incorrectly. 

8j 
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CONVERSATIONAL ANALYSIS 



INPUT 


TOTAL 


It CORRECT 


It INCORRECT 


% CORRECT 


LEGAL 


295 


293 


2 


99.3 


ILLEGAL 


10 


8 


2 


80.0 


FORCED 


71 


71 


0 


100.0 


TOTAL 


376 


372 


4 


9B.9 



SumT^aiy of Conversational System Performance 
Figure 7 

In addition it shows the percent of correct analyses associated 
with this operation, These results indicate a very high level 
of performance for the system, Not only does It handle valid 
inputs success f u 1 ly I but it is also able to detect Invalid inputs 
and treat them properly. The total number of Inputs shown in 
Figure 7 is actually greater than the total number of input lines. 
This is because several inputs may be compounded onto a single 
line. 



B) User Performance 

The measures of user performance are necessarily more sub*- 
jective than those of system performance. However » these results 
can provide useful information into the overall validity of 
this type of approach to a conversational implementation. 

For each useri at least two dialogues are conducted; one 
with the user having a minimum of syitetu knowledge, and one 
where he has more instruction and previous experience. On the 
first try, every user responded properly to the initial system 

and was able to turn on the guide facility. Then using 
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the guide facilityi all but one user was able to successfully 
complete a simple retrieval process. The one exception did not 
understand the use of the word ’’default". After this was 
explained, the operation progressed normally. In general, all 
jsers were able to respond properly to the guide questions. 

The only major problem occurred at the end of the guided dia- 
logue where the process is recycled and started again. It was 
not obvious to the user nt this point, how he could sign off. 

But most users knew enough to request a tutorial which then 
explicitly displayed the available options; SIGNOFF being 
one of them. An example of this situation appears in Figure 
6A. A slight modification of the final guide process can rec- 
tify this. 

Having been guided through retrieval operation supplies 
the user with a great deal of insight into the use of the 
system. Using this experience and a small amount of added 
instruction to fill in any areas not touched by the guide faci- 
lity, the user next attempts a normal (unguided) dialogue* All 
of the users tested were able to conduct a reasonable dialogue 
without outside help. A few of the users who had previous 
information retrieval experience were able to perform a highly 
competent retrieval after only a single introductory guided pro- 
cess. Of course nearly all of the users became stuck at some 
point and had to tequest a tutorial. Of the 32 tutorial calls 
made by all users, all but one supplied the information neces- 
sary for the user to continue. In some cases where the user 
received the master status tutorial, the single message answered 

o 

‘ERIC the user's questions. He was then able to continue 
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by making several references back to the same message. The 
one situa<:ion in which the tutorial did not help occurred when 
a user requested a tutorial during a guiJe process* Since 
the guide facility operates by generating successive tutorial 
messages, the user's request resulted in a repeat of the pre- 
viously printed message. Thus the tutorial present no new 
information. The user, however, was able to extiicate himself 
by requesting a default option, In all the dialogues there 
was no case in which a user was forced to stop because he 
became hopelessly lost. 

At the conclusion of each user dialogue he is asked his 
opinion of the system. The reaction of netrly all the users 
was favorable. They found the system both simple to learn and 
use. The tutorial facility is very well 'ceived, especially 
the convention of printing the appropriate tutorial in response 
to an erroneous input. Most of the critical corLiunts centpi 
around revision in the wording of the various mcscr'ges. A few 
of these messages are felt to be insufficiently clear to a new 
user. One user suggested that tutorials not only explain their 
options but also provide some samples of appropriate valid 
inputs. This comment, however, appears to be based on user 
timidity more than anything else. Unlike others, this user did 
not fully appreciate the natural language capabilities of the 
system and was afraiw of submitting an erroneous input. He 
therefore wanted the sample input as a highly structured quide- 
line for his input. But because of the ability of the system 
IJ treat natural language, such guidelines are unnecessary. 

er|c 
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The overall feeling of the user is that the system provides 
an easy to use yet sufficiently rigorous conversational informa- 
tion retrieval facility. In addition the control conversational 
dialogue can be performed at each user's particular level of 
compe t ence • 

C) Timing 

No analysis of a potential on-line system is complete without 
saying something about processing time. The current conversa- 
tional program is written in FORTRAN and contains a great deal 
of diagnostic processing and output^ as well as other debugging 
aids. It might therefore be considered that the timing statis- 
tics fee the program would be somewhat worse than could be 
achieved using more efficient production programming techniques. 
However, th^^se results do give a general idea of the processing 
speed. The timing of each operation varies from about 50 to 
150 milliseconds. Tue complete set of 376 operations is performed 
in 37.057 seconds or about 0,1 second per input operation. 

When considering an actual console user, a rather conservative 
estimate for the average time between inputs (that is the time 
between end of input signals) is 10 seconds. In practice this 
average is probably higher. Thus at the rate of 10 conversa- 
tional operations per second, the current system could adequately 
support a network of 100 consoles and supply one second or better 
of response rime. Even with the inefficient code and conserva- 
tive estimates, this is clearly within practical limits. 
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6. Future Extensions 

There are a number of areas for future study vrith respect 
to the conversational system. First is a user storage facility. 
With this capability a user could store various aspects of 
his dialogue, such as queries or retrieved documents, for future 
use, In addition, a user could stoie pavameters which would 
be automatically set at sign-on time. This would eliminate the 
need to specify the parameters each tine ho used the system. In 
addition the system can keep various statistics about its 
own performance which are valuable in evaluating and improving 
the system. 

Carrying the storage capability one step further, the 
conversational system could be equipped with a learning sub- 
system. A us r could then specify his own notation along 
with more conventionally stated equivalents, The system would 
then learn the user’s special requirements- In this way a 
user could tailor the conversational system to his exact needs 
and conventions. The learning process could also be used in 
the treatment of erroneous inputs. This is shown in the sample 
script below. The user erroneously requests a nonexistent 
"bool” correlation. The system notifies him of his error 
and requests clarification and whether the incorrect input 
should be learned. After answering affirmatively, the user may 
then use "overlap" or "bool" interchangeably. 

U: BOOL CORRELATION 

INCORRECT CORRELAIION, PLEASE CLARIFY AND 

INDICATE IF INPUT SHOULD BE LEARNED. 

o 



U: 7ES, OVERLAP 

UNDERSTOOD; BOOL 



OVERLAP 



Thus the learning process provides a way of meeMng the parti- 
cular needs of each individual user. 

Some further work must also be done with respect to user 
terminals* Currently the most popular on-line cjmmuncation 
device is the teletype console. These are easy to use and 
relatively inexpensive. The most serious drawback is their slow 
output speed. A fairly simple tutorial may take 30 seconds 
or more to print. This can frustrate the user and needlessly 
tie up the terminal. Another type of terminal is based on a 
cathode ray tube (CRT). These permit almost instantaneous 
display of .-messages. In additioni part of the screen may be 
devoted to a prompting area. In this way tne user always knows 
where hL is in his dialogue and what options are currently 
available. Some CRT units have a light pen which allows selec- 
tion of options by merely pointing the pen at the name of the 
desired option on the screen. However, there are several problems 
with CRT displays. Firol, the added hardware needed to drive a 
CRT makes them very expensive. Some work is being done by 
Bitzer [1] on the design of an inexpensive visual display 
unit which uses a plasma screen and slide projector. However, 
these are not yet commercially available. Alse the CRT produces 
no hard copy, A user might thus have to copy a long list of 
document numbers from the screen. The solution to this may 
be supplied by devices which contain both a visual and a hard 
copy facility. The user conducts his dialogue on the CRT. 

; never he receives someth. ing he wants saved, he indicates the 
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appropriate subset of the script which is then printed. Such 
a device is currently being used exp er i eran t a 1 ly by the RIQS 
System at Northwestern University (13]. 

Another area for future study is the manner in which docu- 
ments are displayed to the user. SMART and a number of other 
systems normally display only the document number. At best 
document numbers provide minimal information about the document's 
conren;.. It might be better tc store document titles or even 
abstracts on-line so that they may be seen by the user. This 
could be done best using a high capacity^ low speed peripheral 
storage device, However, the expense of the dedicated storage 
device along with the prospect of having the terminal tied up 
printing abstracts, may make this technique uneconomical. 

Another possibility is to store document abstracts on microfiche, 
A set nf microfiche and a reader would be supplied at each 
terminal station. The user would get a list of docLment 
numbers from the information retrieval system and then look 
them up off-line at the reader. Not only is the physical equlp- 
irent for this cheaper than an on-line file, but also the fact 
that the scanning of abstracts is done off-line frees up the 
terminal for more useful work. 

The fourth and probably most significant area for future 
development is the anl, sis of the conversational user, It is 
from this type of study that will come significant advances in 
tailoring systems to the actual needs of the system user. 



7. 




Conclusion 

Conversational information processing has many advantages 
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over conventional batch methods. In this stud} i shown 

that it Is quite reasonable tc conduct conveia' infor- 

mation retrieval in a natural language framewo ^ Furthermore 
the template analysis process proves to be a useful technique 
not only for handling the natural language input to a conver- 
sational system, but it can take care of the bookeeping as 
well. The conversational system implemented usin;:i these tech- 
niques is shown by actual user experimentation to provide an 
excellent communication medium between man and machine. 
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